A Neuro-Symbolic ASP Pipeline for Visual Question Answering

نویسندگان

چکیده

Abstract We present a neuro-symbolic visual question answering (VQA) pipeline for CLEVR, which is well-known dataset that consists of pictures showing scenes with objects and questions related to them. Our covers (i) training neural networks object classification bounding-box prediction the CLEVR scenes, (ii) statistical analysis on distribution values determine threshold high-confidence predictions, (iii) translation network predictions pass confidence thresholds into logic programmes so we can compute answers using an answer-set programming solver. By exploiting choice rules, consider deterministic non-deterministic scene encodings. experiments show encoding achieves good results even if are trained rather poorly in comparison approach. This important building robust VQA systems less-than perfect. Furthermore, restricting non-determinism reasonable choices allows more efficient implementations approaches without losing much accuracy.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Revisiting Visual Question Answering Baselines

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict ...

متن کامل

iVQA: Inverse Visual Question Answering

In recent years, visual question answering (VQA) has become topical as a long-term goal to drive computer vision and multi-disciplinary AI research. The premise of VQA’s significance, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps ‘understand’ less than initially hoped, and in...

متن کامل

Differential Attention for Visual Question Answering

In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that...

متن کامل

Interpretable Counting for Visual Question Answering

Questions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA). The most common approaches to VQA involve either classifying answers based on fixed length representations of both the image and question or summing fractional counts estimated from each section of the image. In contrast, we treat counting as a sequential decision process ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Theory and Practice of Logic Programming

سال: 2022

ISSN: ['1471-0684', '1475-3081']

DOI: https://doi.org/10.1017/s1471068422000229